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Abstract 

Packet reordering is an important property of network traffic that should be cap- 
tured by analytical models of the Transmission Control Protocol (TCP). We study 
a combinatorial problem motivated by RESTORED (T), a TCP modeling methodol- 
ogy that incorporates information about packet dynamics. A significant component 
of this model is a many-to-one mapping B that transforms sequences of packet IDs 
into buffer sequences in a manner that is compatible with TCP semantics. We show 
that the following hold: 

• There exists a linear time algorithm that, given a buffer sequence W of 
length n, decides whether there exists a permutation A of {1, 2, . . . , n} such 
that A 6 B^ 1 (W) (and constructs such a permutation, when it exists). 

• The problem of counting the number of permutations in B~ 1 (W) has a 
polynomial time algorithm. 

• We also show how to extend these results to sequences of IDs that contain 
repeated packets. 
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1 Introduction 



Consider a sequence of TCP packets, identified by their integer IDs, as handled by their 
receiver. The receiver must forward the packet sequence to an application, subject to 
respecting packet sequence integrity. That is, at every moment the IDs of packets 
forwarded to the application must form a contiguous sequence 1,2, ... ,m, for some 
m > 1. Packets can arrive out-of-order and thus need to be buffered. Several copies of 
a packet can arrive, but only one copy of a given packet is useful (and will be stored, if 
needed). We assume that the receiver evicts a given packet from the buffer and passes 
it to the application as soon as possible, i.e., as soon as the packet sequence integrity 
constraint is satisfied. 

A given sequence A = (A\, . . . , A n ) of packet IDs yields a corresponding se- 
quence B(A) = (Ba,i, ■ ■ ■ , Ba,u) representing the evolution of the buffer size. In this 
paper we are interested in the following problem: given a sequence of positive integers 
W, what is the complexity of 

1. Deciding whether there exists a permutation A with W = B(A)1 

2. Counting the number of permutations in the set B^ 1 (W)l 

2 Motivation 

The problem we described in the introduction arises in the context of analytical model- 
ing of TCP dynamics. Therefore, the reader only interested in the combinatorial aspects 
of the problem can focus on the remaining sections. This section explains in detail the 
motivation for the problem. 

While a lot of attention has been given to modeling the temporal aspects of TCP 
traffic (see e.g. Jaiswal et al. Q), the dynamics of packet IDs has not received the same 
attention. As Bennett et al. have shown, packet reordering is more widespread than 
originally believed, and is increasingly becoming so, due to technological advances 
such as link striping and mobile communications. Packet reordering has many severe 
effects on overall traffic characteristics, hence it is an important component of TCP 
dynamics (we refer the reader to El for further discussion). 

Paper |1| introduced RESTORED, a methodology for semantic compression and 
regeneration of large TCP traces. RESTORED is based on the following observation: 
TCP guarantees to deliver an ordered packet stream to the application layer and needs 
to buffer packets that arrive out-of-order. Consequently, the received packets can be 
classified into two types: those that could be immediately passed to the application 
layer, and those that have to be temporarily buffered. A received packet that allows 
the buffer to flush is called a pivot packet. All packets appearing in order are trivially 
pivots. RESTORED divides the received sequence into segments, bounded by pivot 
packets. Segments correspond to one of two phases: 

• An ordered phase, in which no reordering is present, thus there is no need for 
buffering. 
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• An unordered phase, in which there is reordering and buffering^ Each occur- 
rence of this phase ends when a pivot packet is received. 

Restored preserves packet reordering properties of TCP traffic, up to a notion 
of semantic equivalence of packet traces. This notion is called behavioral equivalence 
and can be motivated as follows: 

Definition 1 Let ACKi be defined as the smallest integer that does not appear among 
the first i packet IDs (also, define ACKq = 1). Parameter ACKi is called the ac- 
knowledgement (ACK) at stage i. 

The previous definition relies on the simplifying assumption that in the implemen- 
tation of TCP each received packet is ACKed, and that value ACK , is the only infor- 
mation carried by the ACK packet. Of course, real-life acknowledgment policies of 
TCP can be more complicated lH. 

Consider now the following two packet ID sequences: 4 2 3 1 and 4 3 2 1. 

Both these sequences trigger identical ACK responses, namely 1115, i.e., we 
arrive at the following two mappings: 



4231^-1115, 

4321^-1115. ( } 

Since TCP is a receiver-driven protocol, assuming identical network conditions, 
and discounting possible differences in the value of the congestion window at the be- 
ginning of the sequences, the two ID sequences trigger identical responses from the 
receiver, and should thus be regarded as indistinguishable from the standpoint of TCP 
dynamics. 

Definition 2 Two sequences of packets P and Q are behaviorally equivalent (written 
P =beh Q) if they lead to the same sequences ofACKs. 

In practice one might want a notion of equivalence that is even more restrictive 
than behavioral equivalence. This was, for instance, the case of RESTORED. Its original 
motivation was to provide a way to compress TCP traces and estimate various measures 
of quality of service of the original traces by reconstructing "compatible" sequences. 
Many measures of packet reordering have been proposed in the networking literature 
GJ|6]E1]- Given such a measure M, one way to guarantee that sequences produced by 
Restored resemble the original sequence with respect to measure M is: 

1 . Identify an equivalence notion of ID sequences = such that M is consistent with 
respect to =, that is 

(VA, B): {A = B)=> (M(A) = M{B)). (2) 

1 A technical assumption we will employ is that duplicates of packets that have already been uploaded to 
the application layer are discarded. This is a sensible assumption, given TCP behavior. 
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2. Make sure that for any sequence A, the sequence R(A) regenerated by RE- 
STORED satisfies R(A) = A. 

(See also (8) for more discussion and clarification). Behavioral equivalence might be 
too coarse (as an equivalence relation) to guarantee consistency of many reordering 
metrics and, thus, needs to be refined. In a companion paper J9) we have considered 
such an equivalence notion, based on the following notion of buffer size: 

Definition 3 Let A = {A\, A%, . . . , A n } be a sequence of packet IDs. We define the 
FB as an operator that after receiving a packet Aj at time index i, outputs the difference 
between the highest ID (Hi) seen so far and the highest ID (Li) that could be uploaded. 

FB(Ai) = Hi - Li. (3) 

In other words, FB is the size of the smallest buffer large enough to store all packets 
that arrive out-of-order, where the definition of size accounts for reserving space for 
unreceived packets with intermediate IDs as well. The buffer sequence FB (P) associ- 
ated with a sequence P of packet IDs is simply a time-series o/FB values computed 
after each packet has been received. 

Two sequences of packet IDs P and Q are FB equivalent (written P =pB Q) if 
FB(P) =FB(Q). 

This definition is directly related to the semantics of TCP, since it preserves quanti- 
ties such as the size of the AdvertisedWindow (see iflOl ). Inverting the mapping FB can 
be done in polynomial time |9 |. However, the complexity of computing the cardinality 
of the preimage FB _1 (W) was left open, and was only solved in two special cases. 

In this paper, we use a different notion, introduced below, for which more precise 
results can be obtained. 

Definition 4 Buffer size is the smallest size of a buffer that can store all out-of-order 
packets. Two sequences of packets P and Q are buffer equivalent (written P =buf Q) 
if B(P) = B{Q), that is the sequences of buffer sizes associated with receiving P and 
Q are identical. 

From a combinatorial perspective, buffer equivalence is more natural than FB equiv- 
alence. Its relation with behavioral equivalence is, however, slightly more complicated: 

1 . Buffer equivalence is not a refinement of behavioral equivalence in general. In- 
deed, sequences of packet IDs 2 3 3 1 and 3 4 12 are buffer equivalent (they 
both map to sequence 1 2 2 0) but not behaviorally equivalent (the ACKs are 1 1 
1 4 and 112 5, respectively). This stands in contrast to FB equivalence which is 
indeed J9) a refinement of behavioral equivalence. 

2. Buffer equivalence refines behavioral equivalence when restricted to permuta- 
tions (sequences with no repeats or lost packets). For a formal statement and 
proof of this claim see Proposition[T]below. 

3. Finally, buffer equivalence is incomparable (as an equivalence notion) with FB 
equivalence |8]. 
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On the other hand there exist reordering metrics M defined in the networking liter- 
ature (e.g. reorder buffer density ifTTl ) with the following properties: 

1 . M only depends on packets received for the first time, and not on repeat packets. 

2. M is inconsistent with respect to FB equivalence but consistent with respect to 
buffer equivalence (metrics with opposite consistency properties exist as well; 
see O for further details). 

The recovery of such metrics via the argument described in equation (f2]i motivates 
the problem we study in this note: inverting the many-to-one map B and counting the 
size of its preimage. Results for map B are slightly stronger than those proven in [|9) 
for map FB. Namely, computing the cardinality of the preimage of map B, as well as 
returning one element from the preimage can be done in polynomial time (even linear 
time for the latter problem). 

3 Preliminaries 

We will use notation x— y = max{x — y, 0}. 

We employ standard graph theoretic notions throughout. In this paper, graphs are 
always bipartite and undirected. Denote by d(v) the degree of vertex v and by N(v) 
the set of neighbors of v. 

Definition 5 A bipartite graph G = (Vi, V2, E) is doubly convex ;/ there exist per- 
mutations 7Ti, 7T2 of vertex sets V\, V2, respectively, such that for every i £ {1, 2} and 
every vertex v <E Vi the set of vertices w that are adjacent to v forms an interval (i.e. a 
set of consecutive nodes) of '7T3_i(V3_j). 

Definition 6 A sequence of IDs W is a valid buffer pattern if there exists a permutation 
A of {I, 2, . . . , \W\} such that B(A) = W. 

Note that any valid buffer pattern W necessarily ends in a zero, since for A G 
B~ 1 (W) all packets in A can be passed to the application layer when the last packet in 
A is received. Also, without loss of generality, one can assume that the only position 
in a valid buffer pattern that is equal to zero is the last one, since one can decompose 
a given pattern W into disjoint segments, bounded by those positions equal to zero 
(where the buffer, therefore, gets flushed). To each such segment one can associate a 
permutation of a contiguous set of IDs. 

4 Inverting Buffer Sequences 

Our main result is 

Theorem 4.1 The following are true: 
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1. There is an algorithm that, given an encoding W = W\$W<z#- . . . ifW n ## of 
a sequence of positive integers as input ( the Wi 's are integers in binary notation 
and =ft= is a new symbol) decides in time 0(\W\) whether W is a valid buffer pat- 
tern, and if this is the case constructs a permutation A such that A = B~ 1 (W). 

2. Counting the cardinality of the set of permutations in the preimage B~ 1 (W) can 
be done in polynomial time. 

Proof. 

We will provide, in essence, a reduction of the problem above to the problem of 
finding a maximum matching in a special class of doubly convex bipartite graphs lfl2l . 
The complexity of this problem is linear in the number of vertices of the graph lfl2l . 
Since the size of the bipartite graph that is created by reduction is linear, the overall 
complexity of the problem is linear. 

A valid buffer sequence consists of positive integers, with the exception of the last 
entry, which is zero. Any two consecutive values of the buffer sequence Wi and Wi+i 
can only be in one of the following situations: 

1. Wi = Wi-i + 1. This situation corresponds to one new out-of-order packet 
being received at stage i. This holds for i = 1 as well, if we let Wo = 0. 

2. Wi < Wi-\. This situation corresponds to the newly received packet causing a 
non-empty portion of the buffer to be flushed. In particular the ID of the received 
packet can be inferred at this stage, and is equal to the smallest index of a packet 
not received so far. 

3. Wi = Wi-\. This situation corresponds to the packet received at this stage being 
the first packet not previously received. Receiving this packet does not cause any 
other packet to be sent to the application layer. 

If the input sequence fails to satisfy these conditions (for instance if there exists 
an index i with Wi — Wi-i > 1) then the set of permutations in B~ 1 (W) is empty. 
Otherwise, let Si, S%, S3 be the set of indices corresponding to the three cases listed 
above. 

During the course of the algorithm we will keep track of the value ACKi, computed 
assuming that W is a valid buffer pattern. Initially ACKo = 1. We have the following 
recurrence relations (mirroring the three cases described above): 

1. The newly received packet is out-of-order. Thus, it does not change the value of 
parameter ACK . Therefore 

ACK t = ACKt-i. (4) 

2. The newly received packet has ID ACKi_\. In addition, it makes the buffer 
shrink in size from Wi_i to Wi, which means that 

ACK, - ACK^ + W,-i -Wi + 1. (5) 
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3. The newly received packet has index ACKi_i and does not cause the buffer to 
shrink any more. Therefore 

ACKi = ACKi-i + 1. (6) 

For all indices i G S2 U S3, the index of the received packet is uniquely determined, 
and equal to ACKi_\. 

We will now create a bipartite graph G = (Vi, V 2 , E). Nodes in V\ correspond 
to stage indices i G {1, . . . n}. Nodes in V 2 will correspond to packet IDs. First, 
let Vi = S u and let V 2 = {1, ... ,71} \ {ACK^ ieS 2 U S 3 }. Clearly \V 2 \ = 
n — \S 2 U 53 = I Si I = \Vi\. Second, given node i G Vi, add edges to all vertices 
j G V 2 such that j > ACKi. 

With this definition we have: 

Lemma 4.1 Permutations from the set B~ 1 (W) are in bijective correspondence with 
elements of M AT C H (G), the set of all perfect matchings in G. In particular B" 1 (W) 7^ 
if and only if G has a perfect matching. 

Proof. 

Each permutation can be seen as a set of pairs By the previous discussion, 

the set of acknowledgements {ACKi}i>o is the same for any permutation in B~ 1 {W). 
Moreover, for all a G B^ 1 (W) and index i G S 2 U S3, a[i] = ACKi-i. Also, for 
such a permutation a, by definition of graph G it is easy to see that all pairs (i, a[i}) 
with i G Si are edges in G. Hence a corresponds to a perfect matching in G. 

Conversely, every perfect matching M in G naturally corresponds to a sequence of 
pairs, that can be completed (by adding all pairs (i, ACJQ-i) for all values i not in 
Vi) to a mapping A defined on {1, ... , n}. A is actually a permutation. Indeed, the 
values of parameter ACKi, i G S 2 U S3, are all different, and are not included in V 2 . 
It follows that A maps n numbers onto n different numbers, hence it is a bijection. 

To show that A G B^ 1 (W), assume that this was not the case, and let i be the 
smallest index such that Ba,i 7^ W{. Thus Ba,i-i = where, by convention 

B Afi = 0. 

Case 1 B AA = B^u-i + 1. Since W t ^ B A>i and W t - < 1, the only 

possible alternatives are Wi = Wi-i or Wi < Wi-i. But then index i is not in 
V\ and is matched in A to integer ACK^\. This contradicts the assumption that 
B A .i = Ba.i-i + 1, since the packet with ID ACK^\ is the first not received in the 
first i — 1 phases, and can thus be uploaded at stage i. The contradiction comes from 
our assumption that sequences B(A) and W are different. 

Similar arguments can be applied in the two remaining cases for the evolution of 
sequence B(A), and the conclusion of the argument is that A G B~ 1 (W). 

□ 

Lemma 4.2 Let a\ > a 2 > . . . > a m be the number of ones on the first, second, 
. . . , m'th row of Mq, the adjacency matrix of G (call {ax, . . . , a m ) the type of Mq). 
Then we have 
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1. G has a perfect matching if and only if for all i = 1, . . . ,m, a, > m + 1 — i. 
When this condition holds, a perfect matching in G can be constructed by taking 
elements on the diagonal of Mq. 

2. The number of matchings in G is given by 

\MATCH{G)\ = a m (a m -i-l){a m - 2 -2) (oi-(m - 1)). (7) 

Proof. 

Denote the cardinality of set MATCH(G) by r(oi, . . . , o m ) (to highlight its de- 
pendency on parameters oi, . . . , a m ). Expand the permanent across the last row. Since 
Oi, . . . , a m _i are all greater or equal to a m , it follows that T(ai, . . . , a m ) is the sum 
of the permanent of a m minors, all of them of type (oi— 1, . . . , o m _i— 1). Thus, 
r(ai, . . . , a m ) = a m ■ r(oi— 1, . . . , a m _i— 1), and formula © immediately follows 
by noting that, for all i > 1, (a—(i — 1)) — 1 = a—i. 



□ 

We now complete the proof of Theorem |4.1| 

1 . Algorithm TwoStageGreedy in Figure[TJproduces a perfect matching (if it exists). 
Its correctness follows from the recurrence relations for parameter ACKi and 
Lemma |4~2l (2). With a little care the algorithm can be implemented in 0(| W|) 
time (using 0(|V^|) additional memory) as follows: 

(a) We use two buffers, P and Q, each for [log 2 (n)] integers. They are in- 
tended to hold numbers Wi and Wi-\. The for-loop can be implemented 
by simply scanning the input from left to right, copying the correct infor- 
mation into buffers P and Q. Only two buffers are needed, provided we 
keep switching roles of P and Q (they will alternately keep the last value 
Wi). All test conditions in the algorithm involving these numbers, as well 
as computing Wi — Wi— i, will be performed using buffers P and Q, and 
can be accomplished by scanning these buffers C times, for some fixed 
constant C. 

(b) The final for loop can be implemented in linear time by scanning buffer a 
from left to right, using an additional counter for the value of index j. 

(c) In the algorithm we keep incrementing several counters. The problem of 
incrementing counters is well-known to have linear time algorithms via 
amortized analysis lfl"3l . 

2. Computing \MATCH(G)\ using formula (j7]i can be done in polynomial time 
as follows: 

(a) First, there is a linear time algorithm that, given input W, outputs the list 
of numbers oi, . . . , a m . 
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(b) Given these numbers, computing \MATCH (G)\ can be accomplished in 
time polynomial in m + |~log 2 \MATCH(G)\\ by the brute-force product 
computation in (0. Since \MATCH{G)\ < n\ (simply because match- 
ings correspond to permutations), it follows by Stirling's approximation 
that |~log 2 \MATCH(G)\] = 0{n log n). Thus, the running time is poly- 
nomial in \ W\. 

□ 

The proof of Theorem 14. II also implies that buffer equivalence is a refinement of 
behavioral equivalence for permutations: 

Proposition 1 Let P and Q be two permutations such that P =buf Q- Then P =b e .h 
Q. 

Proof. 

Equations (IUi-© show that the value of parameter ACKi can be recovered directly 
from the buffer sizes. Since P and Q are buffer equivalent, they have identical buffer 
size sequences and, consequently, identical sequences of parameter ACKi. But it is 
easy to see that the sequence of packet IDs (more precisely the corresponding sequence 
of byte IDs) ACKed by the TCP protocol in the case of simple consecutive ACKs is 
precisely ACKi. Therefore P and Q are behaviorally equivalent. 

□ 

5 Reconstructing Packet Sequences with Repeats 

Buffer equivalence is not a refinement of behavioral equivalence in the presence of 
repeats. The reason is that one cannot distinguish between the case when the newly 
received packet is a repeat and Case 3 in the proof of Theorem 14. II (in both cases the 
buffer size stays the same). However, for a repeat packet the value of the ACK pa- 
rameter does not change, while for a packet in Case 3 the value of the ACK parameter 
increases by one. 

One can modify the notion of buffer equivalence (in a somewhat artificial way) to 
incorporate information whether the received packet is a repeat or not. For instance, 
one can define Baa to be minus the buffer size when the i'th received packet is a repeat. 
Denote this new mapping by B. 

Definition 7 Two sequences of packets P and Q are modified buffer equivalent (writ- 
ten P = w Q) ifB(P) = B(Q). 

The analog of Theorem l4. 1 I for mapping B is 

Theorem 5.1 Let W = W r i#M / 2# ■ ■ ■ #W 7 «## be a sequence of integers. 

Deciding whether W is a valid buffer pattern, and in this case constructing an 
ID sequence A such that A = B (W), can be done in linear time. Counting the 
cardinality of the preimage B (W) can be done in polynomial time. 
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Algorithm TwoStageGreedy(W) 

INPUT: a vector W = Wi#W 2 # ■ ■ ■ #W„## of nonnegative integers. 

Let a be a vector of n numbers of length |~log 2 (n)~\ , initially all zero. 

Let ACK be a vector of n + 1 numbers of length |~log 2 (n)] , initially all zero, 

with the exception of ACKq = 1. 

Let chosen be an n-bit vector, with all positions initially zero. 
Let Wo = 0. 

for i = 1 to n 

if (Wi - Wi_i > 1) V ((i < n) A (W< - 0)) V ((i = n) A (W ^ 0)) 

reject 
else 

if (Wi = Wi-i) 

let a[i] = ACKi-i; 

let chosen[ACKi-i] = 1; 

let ACA', = ACKi-t + 1; 
else 

if (Wi < Wi_i) 

let cr[i] = ACKi-!', 

let chosen[ACKi-i] = 1; 

let ACK t = ACKi_x +W i -W i - 1 + l; 
else 

/* Wi = W 4 _i + 1 */ 
let ACK, = ACKi-i; 

for i = 1 to n 

if (cr[z] = 0) 

let a[i] = the first j > AGKi_\ + 1 with chosen[j] = 0; 
return <r. 



Figure 1 : Algorithm for inverting buffer sequences 
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We only outline the proof, since it is very similar to that of Theorem |4.1| Given our 
use of negative numbers in the encoding, we no longer have the positivity constraint 
for elements of the candidate sequence W. However, we still require that only the last 
element be zero. 

The construction of graph G is identical to that in the previous case, since in all 
stages in V\ we can guarantee that a new packet is received. However, we do not have 
a parsimonious reduction of ID sequences to perfect matchings, since repeat packets 
can complete a matching in G in more than one way. 

A polynomial-time counting algorithm exists, nevertheless, since we can comple- 
ment Lemma POl with 

Lemma 5.1 We have 

\B~\W)\ = \MATCH(G)\ x ( JJ \Wi\ J , (8) 

where M ATCH(G) is the set of all perfect matchings in G, and R = {i \ Wi < 0}, 
i.e. the set of stages in which a repeat packet arrives. In particular B (W) ^ if 
and only if G has a perfect matching. 

Also, the construction shows that modified buffer equivalence is a refinement of 
behavioral equivalence. Indeed, from the sequence of modified buffer sizes one can 
uniquely reconstruct the sequence of acknowledgments. The proof then proceeds just 
as the proof of Proposition^ 
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